Audio-Visual Speaker Veri cation using Continuous Fused HMMs

نویسندگان

  • David Dean
  • Sridha Sridharan
  • Tim Wark
چکیده

This paper examines audio-visual speaker veri cation using a novel adaptation of fused hidden Markov models, in comparison to output fusion of individual classi ers in the audio and video modalities. A comparison of both hidden Markov model (HMM) and Gaussian mixture model (GMM) classi ers in both modalities under output fusion shows that the choice of audio classi er is more important than video. Although temporal information allows a HMM to outperform a GMM individually in video, this temporal information does not carry through to output fusion with an audio classi er, where the di erence between the two video classi ers is minor. An adaptation of fused hidden Markov models, designed to be more robust to within-speaker variation, is used to show that the temporal relationship between video observations and audio states can be harnessed to reduce errors in audio-visual speaker veri cation when compared to output fusion.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A New Approach to Integrate Audio and Visual Features of Speech

This paper presents a novel fused-hidden Markov model (fused-HMM) to integrate the audio and visual features of speech. In this model, audio and visual HMMs built individually are fused together using a general probabilistic fusion method, which is optimal in the maximum entropy sense. Specifically, the fusion method uses the dependencies between the audio hidden states and the visual observati...

متن کامل

Transition-oriented hidden Markov models for speaker verification

In this article, we present a novel mechanism by which more precise voiceprints can be constructed in a typical text-dependent speaker veri cation system based on a continuous density hidden Markov model (HMM). Typical voiceprints (speaker-dependent HMMs) are rst trained using a subscriber's enrollment data. The resulting models are then restructured to permit a modeling of sub-state behavior. ...

متن کامل

An Examination of Audio-visual Fused Hmms for Speaker Recognition

Fused hidden Markov models (FHMMs) have been shown to work well for the task of audio-visual speaker recognition, but only in an output decision-fusion configuration of both the audioand video-biased versions of the FHMM structure. This paper looks at the performance of the audioand video-biased versions independently, and shows that the audio-biased version is considerably more capable for spe...

متن کامل

Robust speaker verification insensitive to session-dependent utterance variation and handset-dependent distortion

This paper investigates a method of creating robust speaker models that are not sensitive to session-dependent (SD) utterance-variation and handset-dependent (HD) distortion for hidden Markov model (HMM)-based speaker veri cation systems in a real telephone network. We recently reported a method of creating session-independent (SI) speaker-HMMs that are not sensitive to SD utterance-variation. ...

متن کامل

Fused HMM adaptation of synchronous HMMs for audio-visual speaker verification

A technique known as fused hidden Markov models (FHMMs) was recently proposed as an alternative multi-stream modelling technique for audio-visual speaker recognition. In this paper, we will show that instead of being treated as separate modelling technique, FHMMs can be adopted as a novel method of training synchronous hidden Markov models (SHMMs). SHMMs are traditionally jointly trained on bot...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006